How does my company's sales data compare across different states over time?

In [1]:
import pandas as pd
import numpy as np
import json
import plotly.express as px
from datetime import datetime as dt

import warnings
warnings.filterwarnings('ignore')

Introduction (5 min)

Business Context. You are a business intelligence analyst for a supermarket conglomerate. Your company has stores all over the United States. The company has collected data consisting of line-level order information from all its stores. The company wishes to compare sales data across different months and geographies and make this information available to executive members and key shareholders. If each person has access to an interactive dashboard, it can offer a foundation for further dialogue and great decision making.

Business Problem. The company wants you to make an interactive dashboard that company executives can use to visualize sales and profits across supermarket locations over various time periods. The dashboard should be usable via a web browser like Chrome on the company's intranet.

Analytical Context. In the current case, we will be using Dash by Plotly to develop the dashboard. Dash is an open-source Business Intelligence (BI) platform which allows us to develop complicated interactive dashboards using only Python. It also lets us connect to various data sources and is easy to deploy as a web application.

The case is structured as follows: you will (1) learn about the basic components of Dash; (2) walk through the basic elements of making a dashboard; and finally (3) make a dashboard that compares sales and profit over time across various states.

Understanding the data (10 min)

Let's read the data into a pandas DataFrame and look at all the information we have:

In [2]:
df = pd.read_csv('App/Data/superstore.csv', parse_dates=['Order Date', 'Ship Date'])
df.head()
Out[2]:
Row ID Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Country City ... Postal Code Region Product ID Category Sub-Category Product Name Sales Quantity Discount Profit
0 1 CA-2016-152156 2016-08-11 2016-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson ... 42420 South FUR-BO-10001798 Furniture Bookcases Bush Somerset Collection Bookcase 261.9600 2 0.00 41.9136
1 2 CA-2016-152156 2016-08-11 2016-11-11 Second Class CG-12520 Claire Gute Consumer United States Henderson ... 42420 South FUR-CH-10000454 Furniture Chairs Hon Deluxe Fabric Upholstered Stacking Chairs,... 731.9400 3 0.00 219.5820
2 3 CA-2016-138688 2016-12-06 2016-06-16 Second Class DV-13045 Darrin Van Huff Corporate United States Los Angeles ... 90036 West OFF-LA-10000240 Office Supplies Labels Self-Adhesive Address Labels for Typewriters b... 14.6200 2 0.00 6.8714
3 4 US-2015-108966 2015-11-10 2015-10-18 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale ... 33311 South FUR-TA-10000577 Furniture Tables Bretford CR4500 Series Slim Rectangular Table 957.5775 5 0.45 -383.0310
4 5 US-2015-108966 2015-11-10 2015-10-18 Standard Class SO-20335 Sean O'Donnell Consumer United States Fort Lauderdale ... 33311 South OFF-ST-10000760 Office Supplies Storage Eldon Fold 'N Roll Cart System 22.3680 2 0.20 2.5164

5 rows × 21 columns

In [12]:
df.columns
Out[12]:
Index(['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode',
       'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State',
       'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category',
       'Product Name', 'Sales', 'Quantity', 'Discount', 'Profit'],
      dtype='object')

The data for the superstore has line-level data for each product purchased across different locations. The full list of features available is below:

  1. Order ID: ID of the order
  2. Order Date: Date of the order
  3. Ship Date: Shipping Date of the order
  4. Ship Mode: Shipping medium
  5. Customer ID: ID of the customer
  6. Customer Name: Name of the customer
  7. Segment: Consumer or corporate Sale
  8. Country: Country of store
  9. City: City of store
  10. Postal Code: Postal code of store
  11. Region: Region of store
  12. Product ID: ID of the product
  13. Category: Category of product
  14. Sub-Category: Sub-category of product
  15. Product Name: Name of product
  16. Sales: Value of sale
  17. Quantity: Quantity of product sold
  18. Discount: Discount offered
  19. Profit: Profit on sale

Exercise 1: (7 min)

Before we start with the development of our dashboard, it is important to determine what kind of information and visualizations might be useful for our client. Which of the following plots you think is important to include in our dashboard? Select all that apply.

I. A boxplot that shows the distribution of quantities sold on each sale across time.

II. A map of the Unite States that highlights the value of sales of their products across the states.

III. A scatter plot of the value vs. profit of each sale.

IV. A line plot comparing sales across all cities over time.

V. A line line comparing sales across (user) selected states over time.

Answer. II, III, and V are most applicable in this case:

I. Such boxplot can be helpful in deciding stock quantities of certain product. However, our client is interested in understanding the performance of sales and profits of the company. Thus, this plot might not be of relevance for this case.

II. With a map the client will be able to compare the sales across the states fairly quickly. For instance, the client will be able to recognize which states are doing poorly and thus start planning strategies for improve performance on these states.

III. This plot might be useful to detect sales that were not profitable. We could further use labels or "hover text" to show information of these sales that can help the client in understanding why such sales were bad.

IV. and V. Line plots can be helpful to visualize trends in sales across different locations. However, the number lines shown cannot be large as the plot might end up being useless to the client. By grouping the data across selected states, the client will have access to a better overview of the sale history.

Creating basic plots using Plotly Express (35 min)

In this case, we will use Plotly Express to create the charts that will be in our dashboard. One of the great advantages of Plotly Express is its ease of use and flexibility when creating interactive plots. For instance, we can create a simple sales vs. profit scatter plot as follows:

In [3]:
px.scatter(df, x="Sales", y="Profit", color="Category")  

Notice how it is really easy to zoom in on certain regions of the plot (double-click to go to the standard view). We can also see that hover-over text appears when we place the cursor on top of a point in the plot. We can add further information to the hover-over data as follows:

In [14]:
px.scatter(df, x="Sales", y="Profit", color="Category", hover_data=['State','Sub-Category','Order ID','Product Name'])  

Exercise 2 (10 min):

Create a line plot which shows the history of the company's total monthly sales in the states of California, New York, and Texas. The plot should have three lines (one per state). Do you see any trends in the sales across these states?

Hint: Use the px.line command to create a line plot. Create a new column which extracts the month and year of each sale with the command:

df['Order_Month'] = pd.to_datetime(df['Order Date'].map(lambda x: "{}-{}".format(x.year, x.month)))

Answer. One possible solution is shown below:

In [15]:
#As stated in the hint, let us extract the month of the sale:

df['Order_Month'] = pd.to_datetime(df['Order Date'].map(lambda x: "{}-{}".format(x.year, x.month)))

#Next, we filter the data by month and selected states
states=['California', 'Texas','New York']

ddf=df[df['State'].isin(states)]
ddf=ddf.groupby(['State','Order_Month']).sum().reset_index()

px.line(ddf,x="Order_Month",y="Sales", color="State")

Creating a Choropleth US map using Plotly Express (15 min)

A choropleth map is a type of thematic map that is used to visualize statistical data across geographic regions. Choropleth maps are usually created using three basic components: a Map API (such as Google Maps, Bing, or Mapbox) to retrieve updated map information and other functionalities, a GeoJSON file containing coordinates delimiting the areas (polygons) of interest, and the actual data that will be imprinted in the map.

In the Data folder, we have added a GeoJSON file named us.json that contains the geographical data of all the US states. In this GeoJSON, each state has an unique identifier corresponding to its abbreviation: CA, NY, TX, and so on. Since our DataFrame contains full names, we have also included a JSON file named states.json, which is simply a dictionary between full names of states and their abbreviations. Let's load this data and create a new column in our DataFrame with the abbreviations:

In [16]:
with open('App/Data/us.json') as geo:
    geojson = json.loads(geo.read())

with open('App/Data/states.json') as f:
    states_dict = json.loads(f.read())


df['State_abbr'] = df['State'].map(states_dict)

Plotly Express uses Mapbox to create choropleth maps via the function px.choropleth_mapbox (you can also use the px.choropleth function if you have access to a base map). Let's create a choropleth map showing the total sales across all states in the US:

In [17]:
dff=df.groupby('State_abbr').sum().reset_index()

#Here we can see the syntax used for the creation of choropleth maps:

px.choropleth_mapbox(dff,                         #Data
        locations='State_abbr',                   #Column containing the identifiers used in the GeoJSON file 
        color='Sales',                            #Column giving the color intensity of the region
        geojson=geojson,                          #The GeoJSON file
        zoom=3,                                   #Zoom
        mapbox_style="carto-positron",            #Mapbox style, for different maps you need a Mapbox account and a token
        center={"lat": 37.0902, "lon": -95.7129}, #Center
        color_continuous_scale="Viridis",         #Color Scheme
        opacity=0.5,                              #Opacity of the map
        )

Exercise 3: (7 min)

Create a choropleth map that highlights the number of Phones sales from February 4th, 2015 to October 6th, 2016 for the states in the South region. Which states where the top 3 phone sellers? What were their total profits?

Hint: In order to encode the dates use dt(YYYY,MM,DD).date().

Answer. One possible solution is shown below:

In [19]:
start_date=dt(2015,2,4).date()
end_date=dt(2016,10,6).date()

dff = df[(df['Order Date'] >= start_date) & (df['Order Date'] < end_date)] # We filter our dataset for the daterange
dff=dff[(dff['Sub-Category']=='Phones') & (dff['Region']=='South')]
dff=dff.groupby('State_abbr').sum().reset_index()


px.choropleth_mapbox(dff,                        
        locations='State_abbr',                   
        color='Sales',                            
        geojson=geojson,                          
        zoom=3,                                   
        mapbox_style="carto-positron",            
        center={"lat": 37.0902, "lon": -95.7129}, 
        color_continuous_scale="Viridis",         
        opacity=0.5,                              
        hover_data=['Profit']
        )

The most prolific sellers of phones in the South region were North Carolina, Virginia, and Florida. Even though the largest number of sales occurred in North Carolina, Virgina had a much larger profit ($\$1127.89$ vs. $\$413.12$).

Planning our dashboard (5 min)

Let's now revisit the business problem at hand. Recall the original request, which was to allow executive stakeholders the ability to visualize sales and profit across supermarket locations over various time periods. What features should the dashboard have to satisfy this?

  1. The most prevalent location feature in this dataset is State (there is Postal Code, but there are so many postal codes that this would result in the data being broken up so much that it would no longer be easily interpretable). Thus, it makes sense for the dashboard to allow users to filter on sales and profit by state.
  2. We could allow the user to enter their own start and end dates for the visualization. This satisfies the requirement that they can view trends over various time periods.

At this point, it is a good idea to have a sketch in mind for our dashboard. Several examples can be found in the Dash App Gallery, which can be downloaded and used as inspiration for our dashboard. Although many of these dashboards seem complicated, most of them are actually not hard to make.

Basics of Dash (30 min)

Structure of Dash apps (8 min)

At their core, Dash apps are web applications consisting of components which use Flask, ReactJS, Javascript, and Plotly. The authors of Dash have made sure we don't need to interact with any of these components directly via these other frameworks, and instead we can interact with all of them using only Python.

All Dash apps are composed of two parts:

  1. Layout: This consists of the components that we use in our app (e.g. selectors, plots, HTML elements, div, etc.). The layout is composed of HTML components (dash_html_components) and Dash core components (dash_core_components). The layout is set using the layout parameter of the app variable and is a list of all the components that will render on the browser when we launch our Dash application. More specifically:

    • HTML components: Allows us to use common HTML tags in Python such as Div, H1, etc. These are then converted by Dash into HTML for the browser. In addition, we can also pass various properties to the HTML components such as style, className and id. The full list of Dash HTML components is available here: https://dash.plot.ly/dash-html-components.

    • Dash core components: This group of components enables us to add graphs, sliders, inputs, tables, file uploaders, and more to our app using only Python code. Each of these components comes with its own set of parameters that can be configured to change the look, feel, and interactivity. The full list of Dash core components is available here: https://dash.plot.ly/dash-core-components.

  1. Interactivity: This is the cross-filtering of data whereby changing one plot changes other elements of our dashboard. We use Dash callback functions to make our app interactive; these functions take inputs from multiple sources and their output then changes other components of the app. This is similar to programming with Microsoft Excel: whenever an input cell changes, all of the cells that depend on that cell will get updated automatically. This is called reactive programming.

Exercise 4: (15 min)

Let's create our very first dashboard using the following steps. In this exercise, we will use Jupyter Lab; however, you can follow similar steps in your Python IDE of preference (PyCharm, SublimeText, etc.). Just make sure your IDE is running with the provided virtual environment for this case and has the appropriate libraries installed.

4.1 (5 min)

In JupyterLab, locate the file directory where you are running this case. Create new Terminal and Text windows (File->New->Terminal, Text File). In the Text window, paste the following code and save it as Hello_Dash.py, then run the code on the terminal using python Hello_Dash.py. Finally, in your browser go to http://127.0.0.1:8050/. What do you see?

import dash
from dash.dependencies import Input, Output, State, ClientsideFunction
import dash_html_components as html
import plotly.express as px
import pandas as pd
import json

#Create the app
app = dash.Dash(__name__)

#Create Layout
app.layout = html.Div([
    html.H2("US Sales Map", id='title'), #Creates the title of the app

])

#Initiate the server where the app will work
if __name__ == "__main__":
    app.run_server(debug=True)

Answer. We see a webpage with the title "US Sales Map". Notice that we have added an id to the html.H2 element. This id will be helpful when creating app interactivity.

4.2 (10 min)

Add a choropleth map of the company's total sales to the dashboard. In order to do this, first add the choropleth map created before Exercise 3 into the code (you will need to load the data as well) and name the figure Map_Fig. Finally, add a Dash Graph Component into the layout by adding the line:

dcc.Graph(figure=Map_Fig, id='main-figure')

Answer. The new code looks like this:

import dash
from dash.dependencies import Input, Output, State, ClientsideFunction
import dash_html_components as html
import dash_core_components as dcc
import plotly.express as px
import pandas as pd
import json

##################################################################################################
#Load the data and create the map
##################################################################################################
df = pd.read_csv('Data/superstore.csv', parse_dates=['Order Date', 'Ship Date'])

with open('Data/us.json') as geo:
    geojson = json.loads(geo.read())

with open('Data/states.json') as f:
    states_dict = json.loads(f.read())

df['State_abbr'] = df['State'].map(states_dict)


#Create the map:
dff=df.groupby('State_abbr').sum().reset_index()
Map_Fig=px.choropleth_mapbox(dff,                         
        locations='State_abbr',                   
        color='Sales',                            
        geojson=geojson,                          
        zoom=3,                                   
        mapbox_style="carto-positron",            
        center={"lat": 37.0902, "lon": -95.7129}, 
        color_continuous_scale="Viridis",         
        opacity=0.5,                              
        )
################################################################################################

#Create the app
app = dash.Dash(__name__)

#Create Layout
app.layout = html.Div([
    html.H2("US Sales Map", id='title'), #Creates the title of the app
    dcc.Graph(figure=Map_Fig,id='main-figure')

])

#Initiate the server where the app will work
if __name__ == "__main__":
    app.run_server(debug=True)

Finally, let's add a very simple interactivity to our dashboard. Let's create a slider that we can use to change the choropleth map into the scatterplot created just before Exercise 2. We will do this in two steps. Let's start by adding the scatterplot into the code and naming it Scatter_Fig. Then, add a slider component to the layout using the syntax:

dcc.Slider(min=0,max=1,marks={0:'US Map', 1:'Scatter Plot'},value=0,id='fig-slider',)

The new app should look like this. Notice that moving the slider has no effect in the dashboard:

SegmentLocal

Dash let us build interactions between its components with the use of callback functions. In order to create a callback, we simply need to identify the IDs and properties of the components we want to interact with and then create a function that encodes this interaction.

For instance, we want the value property of the slider fig-slider to modify the figure property of the figure main-figure. Moreover, we want value=0 to set figure=Fig_Map and value=1 to set figure=Scatter_Map. We can tell Dash to do this by adding the following code to our app after the layout:

@app.callback(
    Output('main-figure','figure'),
    [Input('fig-slider','value')])
def slider_interaction(slider_val):
    if slider_val==0:
        fig=Map_Fig
    else:
        fig=Scatter_Fig

    return fig

Run the app once again and verify that the callback is working properly by moving the slider.

Note: The app.callback is a decorator that adds further functionality to the function slider-interaction so that it can be used by Dash to update the webpage according to the changes in its components.

The new app should look like this:

SegmentLocal

The final code of this app is the following.

import dash
from dash.dependencies import Input, Output, State, ClientsideFunction
import dash_html_components as html
import dash_core_components as dcc
import plotly.express as px
import pandas as pd
import json

##################################################################################################
#Load the data and create the map
##################################################################################################
df = pd.read_csv('Data/superstore.csv', parse_dates=['Order Date', 'Ship Date'])

with open('Data/us.json') as geo:
    geojson = json.loads(geo.read())

with open('Data/states.json') as f:
    states_dict = json.loads(f.read())

df['State_abbr'] = df['State'].map(states_dict)


#Create the map:
dff=df.groupby('State_abbr').sum().reset_index()
Map_Fig=px.choropleth_mapbox(dff,                         
        locations='State_abbr',                   
        color='Sales',                            
        geojson=geojson,                          
        zoom=3,                                   
        mapbox_style="carto-positron",            
        center={"lat": 37.0902, "lon": -95.7129}, 
        color_continuous_scale="Viridis",         
        opacity=0.5,                              
        )

#Create the scatter plot:
Scatter_Fig=px.scatter(df, x="Sales", y="Profit", color="Category", hover_data=['State','Sub-Category','Order ID','Product Name'])  

################################################################################################

#Create the app
app = dash.Dash(__name__)

#Create Layout
app.layout = html.Div([
    html.H2("US Sales Map", id='title'), #Creates the title of the app
    dcc.Graph(figure=Map_Fig, id='main-figure'),
    dcc.Slider(min=0,max=1,marks={0:'US Map', 1:'Scatter Plot'},value=0, id='fig-slider')  

])

@app.callback(
    Output('main-figure','figure'),
    [Input('fig-slider','value')])
def slider_interction(slider_val):
    if slider_val==0:
        fig=Map_Fig
    else:
        fig=Scatter_Fig

    return fig 


#Initiate the server where the app will work
if __name__ == "__main__":
    app.run_server(debug=True)

Styling and improving our dashboard (75 min)

Congrats! You have already learned the most important functionalities of Dash! However, as you can see from Exercise 4, the resulting app does not look very pleasant to the eye and if we keep adding further functionalities and plots, our code will start to look messy pretty soon. Luckily, we can tidy the code of our app pretty easily by dividing the layout of our app into different files. For instance, let's take a look at the file tree of the App folder provided with the case:

App 
|   app.py
|   index.py
|     
+---assets
|   |   ds4a-img.svg
|   |   ds4a_styles.css
|   |   
|           
+---data
|   |   states.json
|   |   superstore.csv
|   |   us.json
|   |   
|
+---lib
|   |   sidebar.py
|   |   stats.py
|   |   styles.py
|   |   title.py
|   |   us_map.py
|   |   __init__.py  
|

We have structured this file system so that all of the different parts of the layout of our app lie in the folder lib, some basic styling and figures lie in the folder assets, and the data used in the app is located in the folder data. The callbacks and layout definition will be located in the index.py file, which is the one we will run from the terminal. The way the file structure works is not trivial, but let's not worry too much about it now (you can review it afterwards).

Dash Bootstrap Components and Cascading Style Sheets (10 min)

Styles are added to HTML webpages via CSS files. These files describe how the different HTML components are displayed on the screen. For instance, let's take a look at the following style classes from the file ds4a_styles.csslocated in the assets folder (note that every time you want to use a custom Cascading Style Sheets (CSS) file or import images, you have to add them to a folder named assets):

.ds4a-title {
    position: fixed;
    top: 0;
    left: 0;
    z-index: 999;
    height: 120px;
    width: 100%;
    margin-left: 16rem;
    margin-right: 2rem;
    padding: 2rem 1rem;
    background-color: #F8F9F9;
}

.ds4a-sidebar {
    position: fixed;
    top: 0;
    left: 0;
    bottom: 0;
    width: 16rem;
    padding: 2rem 1rem;
    background-color: #1A5276; /*#EBEDEF;*/
}

Here we have a pair of CSS classes that will give style to the title and sidebar of our app. You can see that the sidebar has width 16rem and the title will have a margin of 16rem - we want no other component of our app to clash with the sidebar! In addition, we have set them to be in fixed position so that they are always visible.

Of course, defining your own CSS classes for each component of the app can be a little tedious. In addition, handling the position of the components can sometimes be a little cumbersome. Luckily for us, there exist several CSS libraries that we can use for free. We will use the Bootstrap CSS and Dash Bootstrap Components libraries, which we can then add to our app as follows:

import dash
import dash_bootstrap_components as dbc 

app = dash.Dash(__name__, external_stylesheets = [dbc.themes.BOOTSTRAP]) #USING BOOTSTRAP'S CSS LIBRARY
server = app.server

#We need this for function callbacks not present in the app.layout
app.config.suppress_callback_exceptions = True

Now, run the index.py file. You should see the following app:

SegmentLocal

One of the great advantages of the Bootstrap library is that it uses a grid system to layout the components on the screen. It uses twelve columns that can be resized and relocated if desired. Let's use the grid system to center the title of our dashboard by editing the title.py file and adding the following:

title=html.Div(className="ds4a-title", 
    children=[
        dbc.Row([
            dbc.Col(
                html.H1("US Sales Dashboard"),
                width={"size": 6, "offset": 3}
                )
        ])
    ],
    id="title")

Here, we have created a single row and column of the grid. We resized the column by 6 units and moved it 3 units to the right.

Exercise 5: (20 min)

5.1 (10 min)

Next, let's start populating our sidebar. Add one dcc.DatePickerRange and one dcc.Dropdown component into the sidebar. The dropdown should allow multiple selections of states in the US. Use the IDs date_picker and state_dropdown for these components.

Hint: Read the documentation. The min and max dates of sales in our dataset are dt(2014, 1, 2) and dt(2017, 12, 31). For the value and options properties of the dropdown, you may want to load the state.json as states then use the code:

options=[{"label":key, "value":states[key]} for key in states.keys()],
value=['CA','NY'],

You may want to add some words on top of each component as well.

Answer. The new app should look like this:

SegmentLocal

The updates to sidebar.py are the following:

#############################################################################
# State Dropdown 
#############################################################################
with open('Data/states.json') as f:
    states = json.loads(f.read())

dropdown=dcc.Dropdown(
        id="state_dropdown",
        options=[{"label":key, "value":states[key]} for key in states.keys()],
        value=["NY",'CA'],
        multi=True
        )

##############################################################################
# Date Picker 
##############################################################################
date_picker=dcc.DatePickerRange(
                id='date_picker',
                min_date_allowed=dt(2014, 1, 2),
                max_date_allowed=dt(2017, 12, 31),
                start_date=dt(2016,1,1).date(),
                end_date=dt(2017, 1, 1).date()
            )

#############################################################################
# Sidebar Layout
#############################################################################
sidebar=html.Div(
    [   DS4A_Img, #Add the DS4A_Img located in the assets folder
        html.Hr(), #Add an horizontal line

        ####################################################
        #Place the rest of Layout here
        ####################################################
        html.H5("Select dates"),
        date_picker,
        html.Hr(),
        html.H5("Select states"),
        dropdown,
        html.Hr(),
    ],className='ds4a-sidebar'    
)

5.2 (10 min)

Next, let's add the choropleth map from earlier into the US_map.py file. Then, add the line plot and scatterplot from earlier into the stats.py file. These plots should be situated side-by-side. Use the IDs US_map, Line and Scatter.

Hint: If you want to change the color of the background in each figure, you can use the following:

name_of_figure.update_layout(title='Title of plot',paper_bgcolor="#F8F9F9")

Answer. The new app should look like this:

SegmentLocal

The updates to US_map.py are:

#############################
# Load map data
#############################
df = pd.read_csv('Data/superstore.csv', parse_dates=['Order Date', 'Ship Date'])
with open('Data/us.json') as geo:
    geojson = json.loads(geo.read())
with open('Data/states.json') as f:
    states_dict = json.loads(f.read())
df['State_abbr'] = df['State'].map(states_dict)

#Create the map:
dff=df.groupby('State_abbr').sum().reset_index()
Map_Fig=px.choropleth_mapbox(dff,                         
        locations='State_abbr',                   
        color='Sales',                            
        geojson=geojson,                          
        zoom=3,                                   
        mapbox_style="carto-positron",            
        center={"lat": 37.0902, "lon": -95.7129}, 
        color_continuous_scale="Viridis",         
        opacity=0.5,                              
        )
Map_Fig.update_layout(title='US map',paper_bgcolor="#F8F9F9")

##############################
#Map Layout
##############################
map=html.Div([
 #Place the main graph component here:
  dcc.Graph(figure=Map_Fig, id='US_map')
], className="ds4a-body")

The updates to stats.py file are:

##############################################################
# SCATTER PLOT
###############################################################

Scatter_fig=px.scatter(df, x="Sales", y="Profit", color="Category", hover_data=['State','Sub-Category','Order ID','Product Name'])  
Scatter_fig.update_layout(title='Sales vs. Profit in selected states',paper_bgcolor="#F8F9F9")


###############################################################
# LINE PLOT
###############################################################

df['Order_Month'] = pd.to_datetime(df['Order Date'].map(lambda x: "{}-{}".format(x.year, x.month)))

#Next, we filter the data by month and selected states
states=['California', 'Texas','New York']

ddf=df[df['State'].isin(states)]
ddf=ddf.groupby(['State','Order_Month']).sum().reset_index()

Line_fig=px.line(ddf,x="Order_Month",y="Sales", color="State")
Line_fig.update_layout(title='Montly Sales in selected states',paper_bgcolor="#F8F9F9")

#################################################################################
# Here the layout for the plots to use.
#################################################################################
stats=html.Div([ 
    #Place the different graph components here.
    dbc.Row([
        dbc.Col(
            dcc.Graph(figure=Line_fig, id='Line')
        ),
        dbc.Col(
            dcc.Graph(figure=Scatter_fig, id='Scatter')
            )

    ]),
    ],className="ds4a-body")

Exercise 6: (40 min)

It's now time to add some interactivity to our app! By the end of this exercise, our app should look like this:

SegmentLocal

6.1 (15 min)

Create a callback function that, upon selection of dates, updates the total sales amount in the choropleth map. Add this callback after the definition of the layout in the index.py file.

Hint: You can add multiple inputs to a callback. For instance, you can start your callback with the decorator:

@app.callback(
    Output("US_map", "figure"),
    [
        Input("date_picker", "start_date"),
        Input("date_picker", "end_date")
    ],
)

Answer. Here is one possible solution:

#We add the following code to our index.py file:
@app.callback(
    Output("US_map", "figure"),
    [
        Input("date_picker", "start_date"),
        Input("date_picker", "end_date")
    ],
)
def update_map(start_date,end_date):
    dff = df[(df['Order Date'] >= start_date) & (df['Order Date'] < end_date)] # We filter our dataset for the daterange
    dff=dff.groupby("State_abbr").sum().reset_index()
    fig_map2=px.choropleth_mapbox(dff,
        locations='State_abbr',
        color='Sales',
        geojson=geojson, 
        zoom=3, 
        mapbox_style="carto-positron", 
        center={"lat": 37.0902, "lon": -95.7129},
        color_continuous_scale="Viridis",
        opacity=0.5,
        title='US Sales'
        )
    fig_map2.update_layout(title="US State Sales",margin={"r":0,"t":0,"l":0,"b":0}, paper_bgcolor="#F8F9F9", plot_bgcolor="#F8F9F9",)
    return fig_map2

6.2 (15 min)

Create a callback function that updates the scatterplot and line plot according to the date and state selections in the sidebar.

Hint: Starting from Dash version 0.39, multiple outputs are allowed! You may start your callback as follows:

@app.callback(
    [Output("Line", "figure"),Output("Scatter","figure")],
    [
        Input("state_dropdown", "value"),
        Input("date_picker", "start_date"),
        Input("date_picker", "end_date")
    ],
)

Answer. One possible solution is shown below:

#We add the following callback:
@app.callback(
    [Output("Line", "figure"),Output("Scatter","figure")],
    [
        Input("state_dropdown", "value"),
        Input("date_picker", "start_date"),
        Input("date_picker", "end_date")
    ],
)
def make_line_plot(state_dropdown, start_date, end_date):
    ddf=df[df['State_abbr'].isin(state_dropdown)]
    ddf = ddf[(ddf['Order Date'] >= start_date) & (ddf['Order Date'] < end_date)] 

    ddf1=ddf.groupby(['Order_Month', 'State']).sum()
    ddf1=ddf1.reset_index()

    Line_fig=px.line(ddf1,x="Order_Month",y="Sales", color="State")
    Line_fig.update_layout(title='Montly Sales in selected states',paper_bgcolor="#F8F9F9")

    Scatter_fig=px.scatter(ddf, x="Sales", y="Profit", color="Category", hover_data=['State_abbr','Sub-Category','Order ID','Product Name'])  
    Scatter_fig.update_layout(title='Sales vs. Profit in selected states',paper_bgcolor="#F8F9F9")

    return [Line_fig, Scatter_fig]

Note: What happens if you remove all states from the dropdown? What happens if you select a date that is not in the range of the DataFrame? This cases should be taken care of carefully! We can avoid these scenarios by modifying our callbacks. Another possibility is to use the PreventUpdate action from the dash.exceptions library.

6.3 (10 min)

Finally, let's allow the user to add states to the dropdown menu by clicking on the map. This can be done using the clickData property of graph components.

Hint: First try to create a callback that prints the clickData of the US_map graph to the terminal. You can also input current states of components into a callback function using State('component_id','component_property').

Answer. Here is one possible solution:

@app.callback(
    Output('state_dropdown','value'),
    [
        Input('US_map','clickData')
    ],
    [
        State('state_dropdown','value')
    ]

)
def click_saver(clickData,state):
    if clickData is None:
        raise PreventUpdate

    #print(clickData)

    state.append(clickData['points'][0]['location'])

    return state

Exercise 7: (3 min)

We now have a fully functional map which updates as the period selected changes. Let's try to answer the following business questions:

  1. Which state had the highest sales in December of 2015?
  2. Which state had the lowest sales in the year 2015?

Answer. Montana had the highest sales in December 2015, and Maine had the lowest sales for the year of 2015.

Conclusions (2 min)

In this case, we built an interactive dashboard that allows business users to gain insight into sales and profit performance by category across different states. This dashboard allowed us to deep-dive into each state via selection on a map, and greatly eased communication of our findings to a non-technical audience.

Takeaways (3 min)

In this case, you learned about and put together a simple Dash app with the following features:

  1. App layout components, such as maps and plots, using Dash HTML and Dash core components
  2. A pandas data source for those components
  3. User input capabilities
  4. Callback functions that automatically manipulated the components based on the user input

This dashboard can then be deployed to an AWS server or a company's private intranet for use by all stakeholders.